An Efficient Transformer Decoder with Compressed Sub-layers

نویسندگان

چکیده

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of decoder raises inefficiency issue. By examining mathematic formulation decoder, we show that under some mild conditions, architecture could be simplified by compressing sub-layers, basic building block Transformer, and achieves a higher parallelism. We thereby propose Compressed Attention Network, whose layer consists only one sub-layer instead three. Extensive experiments on 14 WMT machine translation tasks our model is 1.42x faster with performance par strong baseline. This baseline already 2x than widely used standard without loss in performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Joint Source-Channel Decoder with Dynamical Block Priors

An efficient joint source-channel (s/c) decoder based on the side information of the source and on the MN-Gallager algorithm over Galois fields is presented. The dynamical block priors (DBP) are derived either from a statistical mechanical approach via calculation of the entropy for the correlated sequences, or from the Markovian transition matrix. The Markovian joint s/c decoder has many advan...

متن کامل

An Efficient Low Power Viterbi Decoder

This paper presents an efficient Low-Power Viterbi Decoder Design using T-algorithm. It implements the viterbi decoder using T-algorithm for decoding a bit-stream encoded by a corresponding forward error correction convolutional encoding system. A lot of digital communication systems incorporated a viterbi decoder for decoding convolutionally encoded data. The viterbi decoder is able to correct...

متن کامل

Efficient Software Decoder Design

In this paper, we evaluate several techniques for generating and optimizing high speed software decoders. We begin by presenting the early stages of a new instruction set description language named ‘Rosetta’. We use specifications written in this language to automatically generate a number of different software decoders. We explore heuristics for generating decoder trees, particularly with rega...

متن کامل

Part 10: Sub-Object Encoder and Decoder

Sub-objects are sorted out according to the SceneProfile associated with an actual service. Each sub-object of arbitrary shape and size is generally represented by rectangular stereoscopic images with the right shape and required spatial and temporal resolution. A mask for the sub-object determines the transparency for each pixel. The motivation for this sub-division is to avoid progressive lay...

متن کامل

Molecular Cloud Formation in Shock-compressed Layers

We investigate the propagation of a shock wave into a warm neutral medium and cold neutral medium by one-dimensional hydrodynamic calculations with detailed treatment of thermal and chemical processes. Our main result shows that thermal instability inside the shock-compressed layer produces a geometrically thin, dense layer in which a large amount of hydrogen molecules is formed. Linear stabili...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i15.17572